Credit Elasticities in Less-Developed Economics: Implications for Microfinance

Author: Finn Deike

Welcome to this interactive RTutor problem set! This problem set is part of my bachelor’s thesis at Ulm University. It is based on the main results of the working paper Credit Elasticities in Less-Developed Economies: Implications for Microfinance by Dean S. Karlan (Department of Economics, Yale University) and Jonathan Zinma (Department of Economics, Dartmouth College) published in the American Economic Review, 98(3) in 2008.

The paper, the data and a supplemental appendix are available online at the following websites: - Paper: https://www.aeaweb.org/articles?id=10.1257/aer.98.3.1040 - Data: https://assets.aeaweb.org/asset-server/articles-attachments/aer/data/june08/20070848_app.pdf - Appendix: https://assets.aeaweb.org/asset-server/articles-attachments/aer/data/june08/20070848_app.pdf

Exercise Content

  1. Introduction

  2. Experimental Design and Data Overview

  3. Randomization Process Validation

  4. Theoretical Model

4.1 Price Elasticities of Loan Take-Up

4.2 Price Elasticities of Loan Size

  1. Pricing Strategy

  2. Conclusion

  3. References

This interactive problem set consists of a reproduction of the results of the above-named paper. The results are reproduced in code chunks, theoretical explanations, info boxes and quizzes. The procedure is very simple. You will solve tasks by entering or editing R code chunks and/or answer short quizzes. Before solving any task you have to press the Edit-button to be able to edit the code chunk. After editing the chunk or answering a quiz question, you have to press Check-button to get a feedback whether you answer is correct or incorrect. Sometimes exercises are more difficult and you might need an advice to solve the task, in this case you can press the Hint button to get further advice. Besides code chunks and quizzes the problem set composes of info boxes. The info boxes provide you with additional information of variables and explanations of statistical models or R commands.

Good work and patience will be awarded with interesting awards which include some additional information on exercise related topics!

Exercise 1 – Introduction

“If you go out into the real world, you cannot miss seeing that the poor are poor not because they are untrained or illiterate but because they cannot retain the returns of their labor. They have no control over capital, and it is the ability to control capital that gives people the power to rise out of poverty.”

― Muhammad Yunus, Banker to the Poor: Micro-Lending and the Battle Against World Poverty (“https://microfinancingafrica.org/10-profound-quotes-about-microfinance/”)

Muhammad Yunus is a Bangladeshi social entrepreneur, banker, economist and civil society leader. He is one of the pioneers of microcredit and microfinance and the founder of the “Grameen Bank” which is also known as “The Bank of the Poor”. In his book “Banker To The Poor” (1997), Yunus describes how he devoted to the “Grameen Bank” to provide the poorest people of Bangladesh with minuscule loans. The quintessence of the book implies that a small amount of credit can transform the lives of the poorest people in the world tremendously.

The book inspired me to find out more about lending microcredits to the poorest in the world and therefore I decided to replicate the paper Credit Elasticities in Less-Developed Economies: Implications for Microfinance by Sean S. Karlan and Jonathan Zinma in my Bachelor thesis.

Over three billion people in developing countries are still without effective access to loan and deposit services. In Sub-Saharan Africa the problem is especially present. The region has the lowest level of access to finance of any region in th world. This becomes particularly obvious looking at the formal relationships between financial institutions and Sub-Saharan households. Only between five and twenty-five percent of households have a formal relationship with a financial institution. The banking system is very small and the microfinance sector is stagnating. In Particular, just two percent of the world’s microfinance institutions are based in this part of Africa. Lack of access to financial services is therefore one of the largest constraints to private sector development in Africa. Addressing this shortfall requires creating new institutions and building operational and managerial capacity from the ground up (Earne, J. et al., 2014).

Though, providing access to microcredit is expensive for lenders in view of the high transaction costs relative to the small amounts borrowed. Therefore, to encourage microlending, profit-driven microfinance organization, including retail and furniture stores, are permitted to charge interest rates that are higher than those payable in respect of debt procured from the formal financial sector. Cash lenders focusing on high-risk segments often charge 30 percent interest per month on a one month maturity loan. Lenders in the informal sector do charge even more, around 30 to 100 percent interest per month. Which seems extremely high compared to lenders targeting low-risk markets where they partially charge less than three percent per month on a 12+ maturity loan (Dean S. Karlan and Jonathan Zinma, 2008).

Another explanation for this could be, that microfinance institutions (MFI) are often forced to increase interest rates to eliminate reliance of subsidies by policymakers. Though, this only makes sense if the poor are actually insensitive to interest rates. Otherwise increasing interest rates would limit the access. Many economic model suggest that loan pricing is tightly related to reliance on subsidies and therefore also to the functioning of the MFI market. However there is little evidence indicating interest rate sensitivities in MFI markets (Dean S. Karlan and Jonathan Zinma, 2008).

Therefore we will replicate the testing of Sean S. Karlan and Jonathan Zinma on the hypotheses of price inelastic demand for consumer credit using randomized trials conducted by a high-risk consumer lender in South Africa. Furthermore, we will examine the optimal pricing strategy of the lender in terms of profit maximization. The field experiment is based on randomized individual interest rate and maturity direct mail offers to more than 50,000 former clients of the lender based on the client’s prior rate.

The problem set consists of various exercises, each exercise covers one or more topics. In the first exercise we will get to know the experiment and have a look at our data set. This includes information of the specific loan offers and client characteristics. The next section validates the integrity of the randomly assigned mail offer letters using different statistical tests. After that we will explain the theoretical model. Using the theoretical model, we will estimate price elasticities of demand for consumer credit in respect to price and loan size. In the last section we will examine the optimal short-run pricing strategy for the lender by determining the costs of reducing and increasing the loan price.

Good luck and have fun solving this interactive problem set to increase your knowledge about microcredit lending in South Africa!

Exercise 2 – Experimental Design and Data Overview

In this first section we will explain the experiment and get to know the data we use in this interactive R-Problemset.

Experimental Design

The cooperating lender of Karlan and Zinma operated for more than 20 years as one of the largest and most-profitable microlenders in South Africa. He competed in a cash loan industry segment that offer small, high-interest, short-term uncollateralized credit fixed monthly repayment schedules to the working poor population.

The experiment was conducted via a direct mail solicitation in form of pre-qualified limited-time offer letters that were mailed to 58,168 former clients of the lender with good repayment histories and which did not have a loan with the lender as of 30 days prior to the mailer. Each offer letter clearly states an interest rate and maturity, both randomized by the lender, and a loan size based on the clients last loan size. Most of the offers are at relatively low rates. The individual rate for each client is randomly assigned based on the distribution for each risk category because risk determined the loan price under standard operations. In particular, the categories for a standard risk schedule for four-month loans are subdivided into low-risk (7.75% PM), medium-risk (9.75% PM) and high-risk (11.75% PM). The maturity randomization is orthogonal to the offer rate randomization. But the difference is that only low- and medium-risk clients received the suggestion randomization. Those who were eligible for the suggestion randomization or in other words maturities longer than four month, received a randomized sample maturity of six or twelve month. High-risk clients were not able to choose higher maturities than 4 months. The only value which is presented on the offer letter and is not randomized is the loan size. It is equal to the last loan size of each client. The experiment was carried out in three mailer waves of start dates grouped by different branches geographically. First, a pilot test in three branches in July 2003, and than the experiment was expanded to the remaining 83 branches divided into two more mailer waves in September 2003 and October 2003.

Each offer contains a deadline between two to six weeks. An offer was accepted by a client by entering a branch office and filling out the application with an loan officer. The loan applications are evaluated per the lenders standard procedure independent of the experimental rates. Following the estimation of the loan officer the clients are assigned a proportional loan size and maturity. The following graphic shows the operational steps of the experiment in detail:

Fig. 1 | Operational Steps of Experiment. (Karlan, Dean S. and Zinman, Jonathan, “Credit Elasticities in Less-Developed Economies: Implications for Microfinance” (2008))

Fig 1. Source: Karlan, Dean S. and Zinman, Jonathan, “Credit Elasticities in Less-Developed Economies: Implications for Microfinance” (2008)

Data Overview

The main data set of the experiment is named kz_demandelasts_aer08.dta. The ending .dta indicates that it is a Stata file. The data set contains data of only the clients who actually received an offer letter from the lender. In fact 1,358 mailers were returned to the lender by the postal service and 3,000 contained atypical relationships between the offer rate and maturity. This leaves a data set of 53,810 clients. The information of the clients range from experimental variables to demographic characteristics. Before we can take a look at our data set kz_demandelasts_aer08.dta, we have to read in the Stata data file with the R-function read_dta() from the haven package. However, we must load the package at first to be able to use the function.


Info: How do we read a Stata File in R?

At first we have to load in the package with the library(".package") command.The read_dta(.data) from the package haven reads a file in Stata version 5-12 binary format into a data frame in R. This function only supports Stata formats after 12, but since we have a 5-12 format it can be used.

For further information check: https://stat.ethz.ch/R-manual/R-devel/library/foreign/html/read.dta.html


1.1) First load the package haven with the library() command. If you think your answer is right press check:

# Load the package with the library() command
library(haven)

1.2) Now you can use the command read_dta() to load the data file kz_demandelasts_aer08.dta and store it in the variable dat.

# # 
# ___ <- ___("kz_demandelasts_aer08.dta")

dat <- read_dta("kz_demandelasts_aer08.dta")

After loading in the data, we can now have a deeper look into our data and the according parameters stored in dat.

There are a variety of ways to get an overview on a data set. We will use the R function head()to show the first six rows of our data set stored in dat. An alternative would be to show six random sample rows, by using the command sample_n(data,rows). This will help us to get familiar with the variables and understand how the data set is structured.

1.3) Show the first six rows of our data file stored in dat. Afterwards press the check button it will execute the function and show you if you are right or wrong:

# Use the function head() to show the first six rows of the data frame
head(dat)
##   sales_grossincome sales_netincome wave grossincome dependants dormancy
## 1                NA              NA    1          NA          5        0
## 2                NA              NA    1          NA          0       23
## 3                NA              NA    1          NA          1       13
## 4                NA              NA    1          NA          0       17
## 5            2.2536         3595.85    1          NA          1        3
## 6                NA              NA    1          NA          0        1
##   itcscore appscore lastterm lastamount   risk itczero offer4 yearlong final4
## 1      601       36        4        600 MEDIUM       0   6.75        0   5.25
## 2      593       21        4        400   HIGH       0  13.25        0  13.25
## 3      672       32        4       1000   HIGH       0  12.50        0  12.25
## 4      697       30        4       1200   HIGH       0   4.25        0   4.25
## 5      605       29        4        300 MEDIUM       0   6.50        0   6.50
## 6        0       23        4        600    LOW       1   5.50        0   5.50
##   trcount tookup tookup_afterdead_enforced applied loansize branchuse
## 1       3      0                         0       0        0       CAB
## 2       1      0                         0       0        0       CPL
## 3       2      0                         0       0        0       CPL
## 4       1      0                         0       0        0       CPL
## 5       2      0                         1       0        0       CAB
## 6      17      0                         0       0        0       CPL
##   normrate_less pstdue_average married female edhi      age      province rural
## 1             1             NA       0      1    0 32.25188 Kwazulu-Natal     0
## 2             0             NA       0      0    0 30.40657 Kwazulu-Natal     0
## 3             0             NA       1      1    0 31.29363 Kwazulu-Natal     0
## 4             1             NA       0      0    0 24.85969 Kwazulu-Natal     0
## 5             1             NA       0      1    1 31.25530 Kwazulu-Natal     0
## 6             1             NA       0      1    0 62.51335 Kwazulu-Natal     0
##   waved1 waved2 waved3 rejected low med onetermshown termshown termshown4
## 1      1      0      0        0   0   1            0        NA         NA
## 2      1      0      0        0   0   0           NA        NA         NA
## 3      1      0      0        0   0   0           NA        NA         NA
## 4      1      0      0        0   0   0           NA        NA         NA
## 5      1      0      0        0   0   1            0        NA         NA
## 6      1      0      0        0   1   0            0        NA         NA
##   termshown6 termshown12 term high itcany appscore0 tookup_outside_only
## 1         NA          NA    0    0      1         0                   0
## 2         NA          NA    0    1      1         0                   0
## 3         NA          NA    0    1      1         0                   1
## 4         NA          NA    0    1      1         0                   0
## 5         NA          NA    0    0      1         0                   0
## 6         NA          NA    0    0      0         0                   0
##   normrate_more grossinterest lntrcount   lnage lnitcscore lnappscore
## 1             0             0   1.09861 3.47358    6.39859    3.58352
## 2             1             0   0.00000 3.41466    6.38519    3.04452
## 3             1             0   0.69315 3.44341    6.51026    3.46574
## 4             0             0   0.00000 3.21325    6.54679    3.40120
## 5             0             0   0.69315 3.44219    6.40523    3.36730
## 6             0             0   2.83321 4.13538         NA    3.13549
##   lnlastamount lnoffer4 lnloansize
## 1      6.39693  1.90954         NA
## 2      5.99146  2.58400         NA
## 3      6.90776  2.52573         NA
## 4      7.09008  1.44692         NA
## 5      5.70378  1.87180         NA
## 6      6.39693  1.70475         NA

As you can see for some variables the head() command returns NA. This is due the circumstances that a large proportion of clients did not ended up taking a loan. Therefore, values of variables such as loansize, which is the loan size the client received from the lender, are not available. However, we can have a look at a sample frame of only clients who actually took up a loan (tookup) to get a better overview of our data. To do so we have to extract the clients who borrowed from the lender from the data frame using the function filter() from the dplyr package.


Info: Function: filter()

The filter(.data, condition,...) function from the dplyr package is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. The first parameter contains the condition e.g. filter(female == 1). The second part of the function is relevant when the data input is grouped. For now this is not relevant therefore we will ignore it temporarily.

For further information check: https://dplyr.tidyverse.org/reference/filter.html


1.4) Filter the main data dat using the filter() function with the condition tookup == 1 to get sample data frame dat_takeup and show the first six rows of the sample frame dat_takeup. Evaluate your answer by pressing the Check button.

dat_takeup <- dat %>%
  filter(tookup == 1)

head(dat_takeup)
##   sales_grossincome sales_netincome wave grossincome dependants dormancy
## 1           1.85974         1859.74    1          NA          0        0
## 2           1.52000         1683.70    1          NA          0       21
## 3           3.10325         2800.00    1          NA          0        2
## 4           1.22000         1129.70    1          NA          0        3
## 5           7.16250         5103.54    1          NA          1        0
## 6           1.20000          711.50    1          NA          2        3
##   itcscore appscore lastterm lastamount   risk itczero offer4 yearlong final4
## 1      697       33        4       1000 MEDIUM       0  10.50        1   9.75
## 2      652       27        4        500   HIGH       0   4.75        0   4.75
## 3        0       26        4       1000    LOW       1  11.75        1   4.50
## 4        0       30        4        300 MEDIUM       1   4.25        1   4.25
## 5        0       27        6       2000 MEDIUM       1   8.50        1   8.50
## 6      679       28        4        600 MEDIUM       0   9.50        0   9.00
##   trcount tookup tookup_afterdead_enforced applied loansize branchuse
## 1       7      1                         0       1     1500       CPL
## 2       8      1                         0       1      400       CGM
## 3      14      1                         0       1     3000       CPL
## 4      14      1                         0       1      200       CPL
## 5      12      1                         0       1     1500       CPL
## 6      10      1                         0       1      600       CAB
##   normrate_less pstdue_average married female edhi      age      province rural
## 1             0         0.0000       1      1    0 70.57632 Kwazulu-Natal     0
## 2             1        25.8800       0      0    0 41.63176       Gauteng     0
## 3             0         0.0000       1      0    0 50.92403 Kwazulu-Natal     0
## 4             1         0.0000       0      1    0 33.02122 Kwazulu-Natal     0
## 5             1         0.0000       0      1    1 32.91992 Kwazulu-Natal     0
## 6             1       254.4838       0      0    0 32.85695 Kwazulu-Natal     0
##   waved1 waved2 waved3 rejected low med onetermshown termshown termshown4
## 1      1      0      0        0   0   1            0        NA         NA
## 2      1      0      0        0   0   0           NA        NA         NA
## 3      1      0      0        0   1   0            0        NA         NA
## 4      1      0      0        0   0   1            0        NA         NA
## 5      1      0      0        0   0   1            0        NA         NA
## 6      1      0      0        0   0   1            0        NA         NA
##   termshown6 termshown12 term high itcany appscore0 tookup_outside_only
## 1         NA          NA    4    0      1         0                   0
## 2         NA          NA    4    1      1         0                   0
## 3         NA          NA    6    0      0         0                   0
## 4         NA          NA    1    0      0         0                   0
## 5         NA          NA    4    0      0         0                   0
## 6         NA          NA    4    0      1         0                   0
##   normrate_more grossinterest lntrcount   lnage lnitcscore lnappscore
## 1             1         630.0   1.94591 4.25669    6.54679    3.49651
## 2             0          76.0   2.07944 3.72886    6.48004    3.29584
## 3             1        1755.0   2.63906 3.93033         NA    3.25810
## 4             0           8.5   2.63906 3.49715         NA    3.40120
## 5             0         510.0   2.48491 3.49408         NA    3.29584
## 6             0         228.0   2.30259 3.49216    6.52062    3.33220
##   lnlastamount lnoffer4 lnloansize
## 1      6.90776  2.35138    7.31322
## 2      6.21461  1.55814    5.99146
## 3      6.90776  2.46385    8.00637
## 4      5.70378  1.44692    5.29832
## 5      7.60090  2.14007    7.31322
## 6      6.39693  2.25129    6.39693

It shows us six rows of the sample data frame where each row typifies one of the 58,168 clients from 86 mostly urban branches who had borrowed from the lender in the past 24 months, and did not currently have a loan from the lender. Furthermore we can see 54 columns. Every column stands for one variable. We are going to take especially a look at series of experimental variables. These include inter alia the three rates which were assigned to each client: offer4 the randomized individual interest rate directed mail offer, final4 the contract rate that was slightly less than the offer rate and yearlong the dynamic repayment incentive that extended preferential contract rates for up to one year. Clients who borrowed after the deadline are describe with the variable tookuo_afterdead_enforced and those who borrowed from another lender tookup_outside_only. In addition the three example maturities presented in some mailers termshown4, termshown6 and termshown12 (four, six and twelve month), which give a prediction of the actual maturity chosen. The variables tookup, applied and onetermshown presents those clients who borrowed, those who applied and those who were eligible for the maturity suggestion randomization. As well as described above, the loan size which is described by the variable loansize.

More over we will include demographic characteristics like: female, married, age, edhi (more educated), rural, dependants (number of dependants), grossincome (gross monthly income 000s of rand), trcount (number of loans with the lender), dormancy (number of months since the last loan with lender) and the three risk categories low (low risk), mid (medium risk) and high risk (neither med or low).

We will examine the most important variables in the following exercises.


Award: JJ Allaire

You successfully loaded our data file and got to know our important variables. Now we have a basis to build on our analysis! The founder of R Studio JJ Allaire would be proud of you!

Fig. 2 | JJ Allaire, retrieved from https://www.youtube.com/channel/UC8wovsufF42uACug27bO3wQ))


Data Analysis

To begin with, we want to answer the question: how many clients applied for the loan on time and actually were assigned to borrow from the lender?

To answer this question we can create an overview over clients who were intended to receive an offer letter (58,168), those who actually received an offer letter (full sample frame), those who applied for a loan (applied) and those who ended up taking a loan (tookup) in a bar plot using the ggplot2 package.

1.5) Just press Check to create the bar plot.

rec <- 58168
full <- nrow(dat) 
app <- sum(dat$applied)
bor <- sum(dat$tookup)

bp_1 <- data.frame(clients = c("Intended to Recieve", "Recieved an Offer Letter", "Applied", "Borrowed"),
                   number = c(rec, full, app, bor))

ggplot(bp_1, aes(x=reorder(clients, -number), y=number)) + 
  geom_bar(stat = "identity", position=position_dodge(), width=0.8, fill = c("darkgreen", "green4", "green3", "green")) +
  geom_text(aes(label=number), vjust=1.8, size=3) +
  labs(title = "Clients Overview", y="Number of Clients") +
    theme(axis.title.x = element_blank())

The bar plot shows us that we have 58,168 who were intended to receive an offer letter and 53,810 who actually received one. As mentioned earlier, the difference can be attributed to the fact that 1,358 mailers were returned to the lender by the postal service and 3,000 contained atypical relationships between the offer rate and maturity. Furthermore, 4,450 clients applied for a loan which corresponds to an application rate of 8.4 percent. Of those clients, 86 percent or rather a total of 3,887 were approved for a loan.

Now we want to take a look at the randomized individual interest rate directed mail offer offer4. The offer rate was stratified by the clients pre-approved risk category. But before we take a look at the distribution of offer rates for the clients of each risk category, we we want show the distribution of the offer rate for the whole data frame unconditional on risk with a simple box plot using the ggplot() and geom_boxplot() functions from the ggplot2 package. The boxplot will show us the distribution of the offer rate based on the minimum, maximum, median and the first and thirdd quartiles.


Info: ggplot()

The ggplot() function from the ggplot2 package is a simple way to convert specific data in a data frame into plots using the data argument. The aesthetics aes function in the template defines an aesthetic mapping, by selecting the specific variables to be plotted and specifying how to present them in the graph, e.g. the positions of the x- and y-values or other characteristics such as color, size or shape. The geoms- functions define the graphical representation of the plotted data, e.g. geom_ponit for scatter plots, geom_line() for trend lines, geom_boxplot() for boxplots or geom_density() kernel density estimate, which is a smoothed version of the histogram. To add an geom- to the template use the + operator. A basic template of a ggplot() looks as the following: ggplot(data = .data, mapping = aes(x=x, y=y)) + geom_function()

For further information check: https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html


1.6) Replace the question marks with the missing values and functions to create a box plot of distribution of the offer rate. If you think your answer is right press the Check button.

# dat %>%
#   ggplot(aes(x="Full Data Set", y=???)) +
#   ??? +
#   stat_summary(fun = mean, geom = "point", size=3, col="red", shape=18)

dat %>%
  ggplot(aes(x="Full Data Set", y=offer4)) +
  geom_boxplot() +
  stat_summary(fun = mean, geom = "point", size=3, col="red", shape=18)

The box plot shows us the distribution of the offer rate. We can see that the median and average value of the offer rate are approximately at 8 percent per month. The minimum offer rate is at slightly above 3 percent and the maximum at slightly below 15 percent. Furthermore, the fist and third quartiles are at around 6 percent and 10 percent.

These determinations are vague estimates of the graphic. But we can have a more detailed look at the distribution by using simple mathematical operations. But this time conditional on the risk category.


Info: Mathematic Operations

There are a variety of mathematical operators which can be used in R with the basic tamplate , e.g. mean(.data) to calculate the average value of a data sequence, min(.data) to determine the minimal value or max(.data) to determine the maximum value of a data sequence, sum(.data) to calculate the sum of a data area, nrow(.data) to determine the length of a data vector or (.data)^n to potentiate a value by n.


1.7) Calculate the minimal (min_value) and maximum (max_value) values of the offer rate in regarding the risk categories by using the group_by() function and mathematic operations. To do so create a new data frame values. If you finished the task press Check.

# values <- dat %>%
#   group_by(???) %>%
#   summarise(min_value=???(offer4), max_value=???(offer4))
# 
# values

values <- dat %>%
  group_by(risk) %>%
  summarise(min_value=min(offer4), max_value=max(offer4))
## `summarise()` ungrouping output (override with `.groups` argument)
values
##     risk min_value max_value
## 1   HIGH      3.25     14.75
## 2    LOW      3.25     11.75
## 3 MEDIUM      3.25     13.75

The data frame shows that the offer rates very from 3.25% to 11.75% for low-risk clients, up to 13.75% for medium-risk clients and as high as 14.75% for high-risk clients.

An interesting function to look at the distribution of rates is geom_density(). The function is part of the package ggplot2. The package is used for creating elegant data visualizations and is based on “The Grammar of Graphics”. geom_density() is often used in combination with the function ggplot() which is also a part of the ggplot2package. We want to create a density estimate of the offer rate in relation to the three risk categories.

1.8) To create a density estimate of the offer rates, just press the Check button.

dat %>%
ggplot(aes(x=offer4, group=risk, fill=risk)) + 
  geom_density(adjust = 1.5, alpha = 0.4) +
  scale_fill_discrete(name = "Risk Category", breaks = c("LOW", "MEDIUM", "HIGH")) +
  xlab("Randomized Offer Rate (%)") +
  ylab("Density") +
  geom_vline(xintercept= c(7.75, 9.75, 11.75), col = c("green", "blue", "red"))

The graphic shows the distribution of the offer rates of the clients in relation to their risk category. We can see that the interest rates of all clients are mostly under their categorical standard schedule. The lowest randomized offer rate is slightly higher than 3% per month and the highest is slightly lower than 15% per month. If we consider the different risk categories it becomes apparent that a offer rate of a low risk client has the highest density between slightly less than 6% and slightly more than 7% at around 0.3. If we now have a look at clients of the medium risk category we see that the offer rate has the highest density between 7% and a bit more than 9% having a ratio of about 0.22. Whereas the density of high risk category is the highest at about 7.5% and between 9% and 11% at 0.16.

Geographical Display

Another aspect that could by interesting is to analyze the geographic circumstances. Our data frame dat contains a column named province, which contains the province in which the applicant applied for a loan. Hence we can create a new data frame grouped by the provinces by using the functions group_by() and summarise().


Info: Function: group_by() & summarise()

The group_by(.data,...) function from the dplyr package takes an existing data frame and converts it into a grouped data frame where operations will be performed per group. The grouped data is often used with the function summarise() from the same package. summarise() applies an operations per each group.

For further information check: https://dplyr.tidyverse.org/reference/group_by.html and https://dplyr.tidyverse.org/reference/summarise.html


1.9) Group the data frame data by the province the applicant applied in using the function group_by(). The filter condition is already given. The new data frame should be named zaf. than use the function summarise() to apply the given operations for each group. Afterwards show the new data frame provinces.

# ___ <- ___ %>%
#   filter(tookup == 1) %>%
#   ___ %>%
#   ___("Average Offer Rate" = round(mean(offer4, na.rm=TRUE),3),
#                                                       "Average Interest Rate" = round(mean(final4, na.rm=TRUE),3),
#                                                       "Average Dynamic Repayment Incentive" = round(mean(yearlong, na.rm=TRUE),3),
#                                                       "Average Loansize" = round(mean(loansize, na.rm=TRUE),3),
#                                                       "Average Maturity" = round(mean(term, na.rm=TRUE),3),
#                                                       "Number of Clients" = sum(tookup)
#                                                       )
# 
# # show the data frame
# 

zaf <- dat %>%
  filter(tookup == 1) %>%
  group_by(province) %>%
  summarise("Average Offer Rate" = round(mean(offer4, na.rm=TRUE),3),
                                                      "Average Interest Rate" = round(mean(final4, na.rm=TRUE),3),
                                                      "Average Dynamic Repayment Incentive" = round(mean(yearlong, na.rm=TRUE),3),
                                                      "Average Loansize" = round(mean(loansize, na.rm=TRUE),3),
                                                      "Average Maturity" = round(mean(term, na.rm=TRUE),3),
                                                      "Number of Clients" = sum(tookup)
                                                      )
## `summarise()` ungrouping output (override with `.groups` argument)
zaf
##           province Average.Offer.Rate Average.Interest.Rate
## 1     Eastern Cape              7.582                 7.059
## 2       Free State              7.374                 6.677
## 3          Gauteng              7.463                 6.915
## 4    Kwazulu-Natal              7.183                 6.382
## 5 Limpopo Province              7.350                 6.678
## 6       Mpumalanga              7.948                 7.458
## 7       North West             10.490                 6.750
## 8     Western Cape              7.610                 6.988
##   Average.Dynamic.Repayment.Incentive Average.Loansize Average.Maturity
## 1                               0.489         1511.607            4.964
## 2                               0.688          959.375            3.906
## 3                               0.506         1512.028            4.950
## 4                               0.416         1375.343            4.748
## 5                               0.553         1597.970            5.244
## 6                               0.545         1447.727            5.409
## 7                               0.000         1000.000            4.000
## 8                               0.538         1313.034            4.333
##   Number.of.Clients
## 1               280
## 2                64
## 3              1268
## 4              1821
## 5               197
## 6                22
## 7                 1
## 8               234

As you can see our newly created data frame is grouped into eight provinces. Tough South Africa is actually divided into nine provinces, the province Northern Cape is missing because no client applied for a loan in this province. We want to create an interactive map of South Africa later in this problem set, therefore we have to add the province Northern Cape to our data frame provinces.

1.10) Just press Check to add the Northern Cape to the data frame.

provinces <- rbind(zaf, c("Northern Cape", NA, NA, NA, NA, NA, NA))

provinces$province <- c("Eastern Cape", "Free State", "Gauteng", "KwaZulu-Natal", "Limpopo", "Mpumalanga", "North West", "Western Cape", "Northern Cape")
names(provinces)[names(provinces) == "province"] <- "NAME_1"

As you can see the provinces of South Africa are: Eastern Cape, Free State, Gauteng, KwaZulu-Natal, Limpopo, Mpumalanga, Northern Cape, North West and Western Cape.

Before we continue with our analysis, try to answer the short question:

Quiz: Why did so many clients applied for a loan in Gauteng and none in the Northern Cape?

  • The Northern Cape is a highly urbanized province and Gauteng consists mostly of sedimentary rocks. [ ]

  • Gauteng is a highly urbanized province and the Northern Cape consists mostly of sedimentary rocks. [x]

The provinces of South Africa vary substantial in size. The smallest but most crowded one is Gauteng. The province of Gauteng is a highly urbanized region. It includes the cities of Johannesburg, Ekurhuleni (East Rand) and Pretoria. Which are three of the five largest cities in the country. On the other hand the largest but most sparsely populated province is the Northern Cape. For comparison the province is slightly larger than Germany (Government of South Africa: “South Africa’s Provinces”, at: https://www.gov.za/about-sa/south-africas-provinces, retrieved 20 January 2021). To get a better idea of what we are talking about, let us import the data with the population of every province. In order to do this we can add a new variable or rather column to the data frame called population. A helpful function to add a variable to a data frame is called mutate.


Info: Function: mutate()

The mutate() function from the dplyr package is a function for creating new variables. The mutate function is typically divided into three parts. First the data frame you want to modify, second the name of the variable you want to create and lastly the value you want to assign to the new variable -> mutate(dataframe, new_variable = new_values)

For further information check: https://dplyr.tidyverse.org/reference/mutate.html


1.11) Add a new column to our existing data frame provinces which includes the population of each province using the mutatefunction. You can use the given vector c(6734001, 2928903, 15488137, 11531628, 5852553, 4679786, 4108816, 7005741, 1292786) as the values we want to assign to the new variable population. If you finished the task, press Check.

# # Replace the question marks with your answer
# provinces <- ???(??? = ???)

provinces <- mutate(provinces, "Population" = c(6734001, 2928903, 15488137, 11531628, 5852553, 4679786, 4108816, 7005741, 1292786))

The population data is provided by the South African government. Check out https://www.gov.za/about-sa/south-africas-provinces# for more information.

Now, we can create an interactive map of South Africa. The map gives us information about how the clients are geographically distributed. We can see that a majority of the clients are from the Gauteng and KwaZulu-Natal provinces which are also the provinces with the most residents which is portrayed by the size of the black bubbles. As mentioned earlier we have no clients in the Northern Cape and only 1 in the North West. If we take a look at the average offer rates, which is portrayed by color, we see that it varies from 7.18 in KwaZulu-Natal to 10.49 in North West. However, the average offer rate in North West is not very informative, because as we know, there is only one client in North West. The average offer rates of the remaining provinces are very similar. For further information, just click on the province you want to get more information about.

1.12) Press Check in order to get a interactive map of South Africa. The color portrays the average interest rate and the bubble size the population of the selected province. You ca get information about a province by just clicking on it.

library(sf)
library(ggplot2)
library(tmap)
library(tmaptools)
library(leaflet)
library(dplyr)
# set options so numbers don't display as scientific notation
options(scipen=999)

# reads the features from the shapefile
mymap <- st_read("data/gadm36_ZAF_shp/gadm36_ZAF_1.shp", stringsAsFactors=FALSE)
## Reading layer `gadm36_ZAF_1' from data source `/Users/finndeike/Documents/GitHub/ImplicationsForMicrofinance/data/gadm36_ZAF_shp/gadm36_ZAF_1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 9 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 16.45189 ymin: -34.83514 xmax: 32.89125 ymax: -22.12503
## geographic CRS: WGS 84
# joining the data of the shapefile and our dataframe
map_data <- inner_join(mymap, provinces)
## Joining, by = "NAME_1"
map_data$`Average Interest Rate` <- as.numeric(map_data$`Average Interest Rate`)
map_data$`Average Loansize` <- as.numeric(map_data$`Average Loansize`)

mymap <- tm_shape(map_data) +
  tm_polygons("Average Offer Rate",
              id = "NAME_1",
              style = "quantile",
              palette="Greens",
              popup.vars=c("Population", "Number of Clients", "Average Interest Rate",  "Average Offer Rate", "Average Dynamic Repayment Incentive","Average Loansize", "Average Maturity")) +
  tm_bubbles(size = "Population", col= "black", id="NAME_1", scale = 2) +
  tm_borders() +
  tm_scale_bar()


tmap_leaflet(mymap)
## Warning: One tm layer group has duplicated layer types, which are omitted. To
## draw multiple layers of the same type, use multiple layer groups (i.e. specify
## tm_shape prior to each of them).
## Legend for symbol sizes not available in view mode.
# tmap mode set to interactive leaflet map

Award: Bartolomeu Dias

Great, you created a map of South Africa which shows the most important average characteristics of the loan contracts for each province! Maybe you now get a glimpse how Bartholomeu Dias felt when he was the first European who explored the coastline of South Africa!

Fig. 3 | Bartolomeu Dias, retrieved from https://escola.britannica.com.br/artigo/Bartolomeu-Dias/481145))


Exercise 3 – Randomization Process Validation

In the last section we loaded in the data file, got a rough look at our data and created a interactive map of South Africa. Furthermore, we explained the experiment. Hence, in this part we will perform a robustness check of the experiment. We will take a look especially at the earlier utilized interest rate offer4.

2.1.1) To load in the data just press Check.

dat <- read_dta("kz_demandelasts_aer08.dta")
dat <- dat %>% mutate(itcscore_100 = itcscore/100, appscore_100 = appscore/100)

Early it was mentioned that the assigned rates are uncorrelated with other given information such as the external or internal credit score. Thus let us check if this assumption corresponds with the reality. We will now do a simple linear regression with the lm() function and check if the offer rate is actually unrelated to other observable characteristics. The value we will look at the closest is the p-value.


Info: Linear Regression with lm()

The basic R function lm() is used to fit linear models and determines whether or not there is a relationship between a dependent variable \(y\) and the independent variables \(x_1, x_2,...,x_n\). The R-command summary() creates a nice summary statistic output of the linear regression. Here is a basic template of an lm() regression and an summary() output:

reg <- lm(y ~ x1 + x2, data=data)

summary(reg)

For further information check: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/lm and https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/summary


The p-value will give us information about the outcome of the randomization. If our hypothesis is correct, the p-value will be comparatively high for each variable and higher than the significance level.

We will also add control variables of the month of the offer(waved2, waved3) and the lender-defined risk level of the client prior to the experiment (low,med) and further characteristics of the client, to avoid endogeniety problems. Control variables are variables which are hold constant during an experiment. They are neither dependent or independent variables. Control variables are not part of the experiment itself but yet they still may influence the outcome of the experiment.

2.2.1) Task: Regress the offer rate offer4 on the given variables using the lm() regression and store it in the variable reg2_1. To do so, replace the question marks of the given code with you answer.

# reg2_1 <- ???(??? ~ dormancy + lntrcount + female + dependants + married + lnage + rural + edhi + itcscore_100 + itczero + appscore_100 + low + med + waved2 + waved3, data=dat)
# 
# summary(???)

reg2_1 <- lm(offer4 ~ dormancy + lntrcount + female + dependants + married + lnage + rural + edhi + itcscore_100 + itczero + appscore_100 + low + med + waved2 + waved3, data=dat)

summary(reg2_1)
## 
## Call:
## lm(formula = offer4 ~ dormancy + lntrcount + female + dependants + 
##     married + lnage + rural + edhi + itcscore_100 + itczero + 
##     appscore_100 + low + med + waved2 + waved3, data = dat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.484 -1.423  0.461  1.844  6.167 
## 
## Coefficients:
##               Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)   8.650802   0.171447  50.458 < 0.0000000000000002 ***
## dormancy      0.001390   0.002005   0.693                0.488    
## lntrcount     0.003727   0.013494   0.276                0.782    
## female        0.023754   0.022137   1.073                0.283    
## dependants    0.000203   0.006758   0.030                0.976    
## married       0.016524   0.022839   0.723                0.469    
## lnage        -0.002388   0.048166  -0.050                0.960    
## rural         0.019894   0.029366   0.677                0.498    
## edhi         -0.012782   0.022172  -0.577                0.564    
## itcscore_100  0.005023   0.014248   0.353                0.724    
## itczero       0.035417   0.096662   0.366                0.714    
## appscore_100 -0.064632   0.135196  -0.478                0.633    
## low          -2.486245   0.038698 -64.247 < 0.0000000000000002 ***
## med          -1.075115   0.041200 -26.095 < 0.0000000000000002 ***
## waved2       -0.291439   0.039424  -7.392  0.00000000000014625 ***
## waved3       -0.296611   0.038030  -7.799  0.00000000000000633 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.327 on 53538 degrees of freedom
##   (256 observations deleted due to missingness)
## Multiple R-squared:  0.1121, Adjusted R-squared:  0.1119 
## F-statistic: 450.7 on 15 and 53538 DF,  p-value: < 0.00000000000000022

Info: Evaluation of a Linear Regression

The p-value is the level of marginal significance within a statistical hypothesis test, representing the probability of the occurrence of a given event or rather a measure of the probability that an observed difference could have occurred just by random chance. The value always lies between 1 and zero. The lower the p-value is, the stronger is the evidence that you should reject the null hypothesis. A p-value of 0.05 is generally speaking statistically significant. Moreover, the significance level is portrayed by little stars.

Check out https://www.investopedia.com/terms/p/p-value.asp for further information.


In fact the p-value of all observable variables is significantly higher than the significance level. Which indicates that indeed the offer rate is highly likely independent of the other observable characteristics and thus the randomization process was successful.

We can also portray the changes that different explanatory variables have on the dependent variable in a regression model with the function effectplot() of the regtools package. The basic concept of this plot is to compare the effects of the explanatory variables if they change from their 10 percent to their 90 percent quantile or for binary variable the effect of changing from 1 to 0. To createthis kind of plot, we just have to add our regression into the function braces effectplot(.reg) (For further information checkout: https://github.com/skranz/regtools/blob/master/man/effectplot.Rd).

2.2.2) Create an effectplot() of our regression reg2_1. Afterwards, press the check button.

# Create an effectplot using the effectplot() function of the regression reg2_1
 effectplot(reg2_1)
## Warning: Ignoring unknown aesthetics: ymax, ymin

The created plot confirms our assumption, that the randomization process was successful. The only explanatory variables which influence our response variable significantly are the control variables of the month of the offer and the risk category of the client.

Since we just learned, that it is reasonable to assume that the randomization process was successful, we will no check if the offer rate below or at the standard rates did influence the clients who borrowed after the given deadline. For this we will use a probit regression which is very similar to simple linear regression. The difference is that a probit regression is a binomial regression which means the outcome is either a success (1) or a failure (0). Where in linear regressions the outcome is scale or rather numerical.


Info: Probit Regression with glm()

The glm()function from the stats-package is used to fit generalized linear models. It is similar structure to the lm()regression, with the exception that with the argument family we can specify what kind of linear model we want to use. In our case we want to use a probit link function. than the function looks as follows: glm(, family=binomial(link=“probit”), )

Check out https://are.berkeley.edu/courses/EEP118/fall2010/section/13/Section%2013%20Handout%20Solved.pdf for further information.


The theoretical model of a probit regression will be explained more in detail in exercise 3.

2.2.3) Use a generalized regression model to apply a probit regression on the tookup_afterdead_enforced (take-up after the deadline) with the offer rate. To Change the code from a simple linear regression to a generalized linear probit regression. If you need help press hint.

# reg2_2 <- lm(tookup_afterdead_enforced ~ offer4 + low + med + waved2 + waved3, data=dat)
# summary(reg2_2)

reg2_2 <- glm(tookup_afterdead_enforced ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=dat)
summary(reg2_2)
## 
## Call:
## glm(formula = tookup_afterdead_enforced ~ offer4 + low + med + 
##     waved2 + waved3, family = binomial(link = "probit"), data = dat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0030  -0.4755  -0.4742  -0.4665   2.1327  
## 
## Coefficients:
##               Estimate Std. Error z value            Pr(>|z|)    
## (Intercept) -0.9695075  0.0333609 -29.061 <0.0000000000000002 ***
## offer4      -0.0005594  0.0029844  -0.187               0.851    
## low          0.7057192  0.0198982  35.467 <0.0000000000000002 ***
## med          0.5628594  0.0213071  26.416 <0.0000000000000002 ***
## waved2      -0.2892394  0.0230624 -12.542 <0.0000000000000002 ***
## waved3      -0.2709625  0.0222376 -12.185 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 44956  on 53809  degrees of freedom
## Residual deviance: 42776  on 53804  degrees of freedom
## AIC: 42788
## 
## Number of Fisher Scoring iterations: 4

As we can see the p-value is 0.851 which is considerably higher than the significance level which substantiates that offer rates at or below the standard rate did not influence the take-up after the deadline. This seems conclusive since the clients borrowed at the standard rate schedule after the deadline.

We can also observe the relationship between the take-up after the deadline and the offer rate from a graphical point of view. Therefore we will use the ggplot() function again to declare the input data frame, than we will utilize the stat_smooth() (will be explained later in detail) to plot our probit regression reg2_2.

2.2.4) Plot the probit regression reg2_2. Just press Check.

ggplot(dat,aes(x=offer4 + low + med + waved2 + waved3, y=tookup_afterdead_enforced)) + 
  stat_smooth(method='glm',family=binomial(link='probit')) +
  ylim(min=0, max=1) +
  ylab("Borrowed after the deadline") +
  xlab("Offer Rate")
## Warning: Ignoring unknown parameters: family
## `geom_smooth()` using formula 'y ~ x'

The graph corroborates our thesis since there is no clear tendency that the probability of a take-up after the deadline increases or decreases with a higher offer rate.

In addition we want to find out if the rejection decisions of the clients were correlated with the offer rate awarding process. The approach is the same as the previous regression. However this time we have to reduce the data frame to only clients who applied for a loan. Furthermore we will add a new function called stargazer() to get a nicer output table which holds the regression results of all three regressions.


Info: stargazer()

stargazer is an R package that creates LATEX-/HTML code and ASCII text for regression tables, with multiple models side-by-side, as well as for summary statistics tables, data frames, vectors and matrices. We use stargazer because the output is well-formatted, it supports many models and it commands many beautiful aesthetics. To create summary statistics table we just have to run stargazer(.data). If we want to create a regression table we have to proceed in the same way but we want to make some adjustments. With the command type= we can determine the output data type. The header= command indicates whether a header (name and version of the package, author,…) should appear.

For further information check: https://cran.r-project.org/web/packages/stargazer/stargazer.pdf


2.2.5) Execute the Regression reg2_3 than use the function stargazer() to create a nicer-looking output table of the regression. We want to set type="html" and header=FALSE. If you need a little hint, press hint.

# reg2_3 <- glm(rejected ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=filter(dat, applied == 1))
# 
# stargazer(???)

reg2_3 <- glm(rejected ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=filter(dat, applied == 1))

stargazer(reg2_3, type="html", header=FALSE)
Dependent variable:
rejected
offer4 0.009
(0.011)
low -0.626***
(0.069)
med -0.285***
(0.065)
waved2 -0.170**
(0.072)
waved3 -0.597***
(0.073)
Constant -0.617***
(0.111)
Observations 4,540
Log Likelihood -1,777.595
Akaike Inf. Crit. 3,567.191
Note: p<0.1; p<0.05; p<0.01

We see that the p-value is once again substantially higher than the significance level which corroborates that the rejection decision was not influenced by the offer rate.

To sum up everything that has been stated so far we found out that our randomization process is was successful not affected by other observable characteristics.


Award: Adrien-Marie Legendre

You successfully completed the first part of our regression analysis! You are on the way to become the next Adrien-Marie Legendre who was the first person to discover the method of least squares!


Exercise 4 – Theoretical Model

It the first two sections we got familiar with our parameters and verified the randomization process. Now we want to comprehend the empirical strategy.

Basic Model

So far we learned a lot about our data and the randomization process thus in this section we will apply our newly acquired knowledge to map our data into testable predictions. In the narrow sense we are interested in the response of loan demand to changes in price and maturity. Our basic model for the estimation is the following: \[y_i = f(C_i, X_i)\] In our model \(y\) measures the extensive (take-up) and intensive (loansize) demand. While \(i\) indicates one of the 53,810 borrowers. The variable \(C_i\) is a vector including the offer rate (\(r_i\)) and the maturity (\(m_i\)). The variables we used for the randomization process of \(r_i\) - the pre-approved risk category (low/medium/high) and the mailer wave (July/September/October) - are included in \(X_i\).

Linear Probability Model

We often use binary variables as independent variables in regressions. But in our case we want to use a binary variable as a dependent variable. This means it is either 1 if something occurs or zero otherwise. It is possible to use the ordinary least squares method (OLS) in which the dependent variable \(y\) is binary. It is called the Linear Probability Model (LPM). The LPM is an OLS method with a continuous dependent variable: \[Y_i = \beta_0 + \beta_1{X_1} + ... + \beta_k{X_{ki}} + \epsilon_i\] To analyze the model we take a look at the conditional expectation of the dependent variable \(Y\) and we see that: \[E[Y|X] = P(Y=1 | X)\] Now we can use this assumption to describe the model above, we assume that: \[E[Y|X] = \beta_0 + \beta_1{X_1} + \epsilon_i = P(Y=1 | X) = y\] Since the expectation of the error term \(\epsilon_i\) given we have \(X\) is 0. The change in probability associated with a change in X to \(X+1\) equals a probability change by the factor of \(\beta_1\). In other words the coefficient \(\beta_i\) can be interpreted as the change in \(Y\) associated with a unit change in \(X_i\) or the predicted probability of having \(y=1\) for the given values of \(x_1...x_k\).

However the classic LPM has a fundamental problem. We will shows this by using the mtcars data set which is included in the dplyr-package. The following plots will show the difference between a LPM and a Probit or Logit model and why probit or logit models are better suited future regressions. Just press check to have a look at the following plots to answer the question bellow:

lm_plot <- ggplot(mtcars, aes(x=mpg, y=vs)) + geom_point() + 
  stat_smooth(method="lm", se=TRUE) +
  geom_hline(yintercept=0, col="red") +
  geom_hline(yintercept=1, col="red") +
  ylim(-0.25, 1.5) +
  ggtitle("Linear Probability Model")

probit_plot <- ggplot(mtcars, aes(x=mpg, y=vs)) + geom_point() + 
  stat_smooth(method="glm", method.args=list(family="binomial"(link="probit")), se=TRUE) +
  geom_hline(yintercept=0, col="red") +
  geom_hline(yintercept=1, col="red") +
  ylim(-0.25, 1.5) +
  ggtitle("Probit Model")

logit_plot <- ggplot(mtcars, aes(x=mpg, y=vs)) + geom_point() + 
  stat_smooth(method="glm", method.args=list(family="binomial"(link="logit")), se=TRUE) +
  geom_hline(yintercept=0, col="red") +
  geom_hline(yintercept=1, col="red") +
  ylim(-0.25, 1.5) +
  ggtitle("Logit Model")

grid.arrange(lm_plot, probit_plot, logit_plot, ncol=2)
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

Quiz: What could be the problem with the Linear Probability Model?

  • It is not possible to use the LPM if the dependent variable is binary. [ ]

  • In an LPM it is possible to get a probability below zero or above 1. [x]

  • There is no problem with the LPM. [ ]

As you see in the plots above, Probit and Logit models are better fitted for a regression with a binary dependent variable. They are in fact specifically made for regression with a binary dependent variable and always results in a probability between zero and 1. Now we have to choose between the probit and logit model. The real big difference between the logit and probit model is the assumptions made about the error distributions. Logit assumes you have a logistic error distribution while probit assumes you have a normal error distribution. Since their is a lot more known about the normal error distribution, we will use the probit model in our future regressions with a binary dependent variable. Now we are able to use this to analyze the response of loan demand to changes in price and maturity.

Probit Model

A difficulty with estimating loan demand elasticities is that the contract terms are often subjected to external influences, such as alternative financing opportunities or other supply decisions. As far as the price sensitivity we approached the problem be randomizing the interest rate based on the clients risk category. This allows us to observe what happen if we change the loan price or in our instance the interest rate. To achieve this, we estimate a probit model of the form:

\[ a_i = \alpha + \beta{r_i} + \delta{X_i} + \epsilon_{ib} \]

In our model \(a_i\) is the independent variable applied which can be either 1 if the client \(i\) applied for a loan or 0 if he or she did not. The offer rate offer4 \(r\) is orthogonal to the standard errors \(\epsilon_{ib}\) by construction and therefore \(\beta\) is an unbiased estimate of the price sensitivity of loan take-up from direct mail offers. We will assume that \(\beta < 0\) since almost every model of consumer choice predicts that the demand is downward sloping with an increase in price.


Award: Gustav Fechner

Wow, you now understand the Probit model. It’s basic model is based on the Weber-Fechner law formulated by Gustav Fechner published in Fechner (1860). He was a German experimental psychologist, philosopher and physicist and he is said to be the founder of psychophysics.

Fig. 5 | Gustave Theodor Fechner, retrieved from https://vlp.mpiwg-berlin.mpg.de/people/data?id=per68))


Since you now learned a little bit about our theoretical model, let’s do a short Quiz based on our data:

Quiz: What do you think would happen to the take-up if we would increase the interest rate by 100 basis-points? Just type “reduces” or “increases” in the blank box.

Answer: reduces

In the next section we will deal with this subject more intensively.

Exercise 4.1 – Price Elasticity of Loan Take-Up

So war far we talked a lot about our data set and our theoretical strategy. But in this section we want to get our first tangible results. In fact in the first part of exercise 4, we will be estimating the price elasticity of loan demand.

4.1.1) Since this is a new exercise, press Checkto load in the data once again.

dat <- read_dta("kz_demandelasts_aer08.dta")

Extensive Margin - Loan Price

As we mentioned in the last section we are using a probit model to estimate the price elasticity of loan demand. We begin with borrowers who applied for a loan before the deadline ended. Tough instead of the lm()-function, we will use the generalized linear model function glm() because it enables us to perform a probit regression.

Marginal Effects

We want to start with a regression limited to clients who received offers at or below the standard rate of their risk category. In the second regression we want to focus on clients who received a higher rate than their interest rate under standard operations. The variable normrate_less() indicates if a client recieved a offer rate at or below their interest rate under standard operations (1) or not (0). As a reminder, the interest rate per month under standard operations for a low risk client was 7.75 percent, for a medium risk client 9.75 percent and for a client of the high risk category 11.75 percent. To do so we have to reduce the data set to only the clients who received a offer at or below their standard rate of their risk category (normrate_less == 1) with the filter() function. Recall that applied is our binary dependent variable and offer4 and the experimental validation variables low, med, waved2and waved3 are our explanatory or rather independent variables.

4.1.2) Use glm() to perform our first Probit regression. Regress applied on offer4, low, med, waved2 and waved3 and filter our data set dat for normate_less == 1. Store this regression in reg3_1. Afterwards display the regression results with the summary() command. Finally press Check or if you need help press hint.

# Perform a Probit regression and show the regression results
reg3_1 <- glm(applied ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=filter(dat, normrate_less == 1))
summary(reg3_1)
## 
## Call:
## glm(formula = applied ~ offer4 + low + med + waved2 + waved3, 
##     family = binomial(link = "probit"), data = filter(dat, normrate_less == 
##         1))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.6919  -0.3799  -0.3514  -0.3337   2.4444  
## 
## Coefficients:
##             Estimate Std. Error z value             Pr(>|z|)    
## (Intercept) -1.32864    0.03929 -33.815 < 0.0000000000000002 ***
## offer4      -0.01998    0.00361  -5.536         0.0000000309 ***
## low          0.57798    0.02270  25.461 < 0.0000000000000002 ***
## med          0.59713    0.02369  25.210 < 0.0000000000000002 ***
## waved2      -0.04692    0.02892  -1.622              0.10472    
## waved3      -0.07745    0.02820  -2.746              0.00603 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 30825  on 53177  degrees of freedom
## Residual deviance: 29453  on 53172  degrees of freedom
## AIC: 29465
## 
## Number of Fisher Scoring iterations: 5

From our computed summary we can see that the offer rate offer4 is significant at the 0.1 percent level which means that a coincidental connection between applied and offer4 is very unlikely. Unfortunately a probit output is not equal to the marginal effects. Though we can say that an increase of the offer rate by 100-basis-point is negatively associated with the loan take-up. Suggesting that clients who received an offer at or below the standard rates are less likely to apply if the offer rate would be 100-basis-points higher. We get similar results if we examine the opposite group of clients, those who received an offer rate higher than their standard ones (normrate_more == 1), in the regression reg3_2 below (we remove the control variables waved2 and waved3 from the regression, because no client who were assigned a higher offer rate than standard for their risk category received an offer in the second or third mailer wave). The lender was primarily interested in testing price sensitivities of low rates and therefore only 632 client received a higher than standard rate.

4.1.2 b) Just press Check to perform the regression.

reg3_2 <- glm(applied ~ offer4 + low + med, family = binomial(link = "probit"), data=filter(dat, normrate_more == 1))
summary(reg3_2)
## 
## Call:
## glm(formula = applied ~ offer4 + low + med, family = binomial(link = "probit"), 
##     data = filter(dat, normrate_more == 1))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7067  -0.3578  -0.3062  -0.2597   2.6738  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  0.28665    1.02742   0.279   0.7802  
## offer4      -0.14897    0.07661  -1.944   0.0519 .
## low          0.13609    0.32499   0.419   0.6754  
## med          0.18578    0.26363   0.705   0.4810  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 308.89  on 631  degrees of freedom
## Residual deviance: 291.79  on 628  degrees of freedom
## AIC: 299.79
## 
## Number of Fisher Scoring iterations: 6

In order to interpret the results more precisely, we will now calculate the corresponding marginal effects of both regressions. To do this we are using functions from the regtools package. In particular the showreg() function which can show marginal effects in a glm model and it allows for robust standard errors


Info: showreg()

The showreg()function from the regtools-package is used to extend and wrap either stargazer or the screenreg, texreg and htmlreg functions in the texreg package. It allows for robust standard errors and can show marginal effects in glm models. To show the marginal effects of a Probit regression it uses the argument coef.transform="mfx" from the mfx-package.

For further information check: https://rdrr.io/github/skranz/regtools/src/R/showreg.r


4.1.3) Just press Check to perform the two above explained regressions and show the marginal probit effects of the two regressions.

showreg(list("(1)"=reg3_1,"(2)"=reg3_2), coef.transform=c("mfx", "mfx"), omit.coef = "(Intercept)", digits=3, type="html")
## Version:  1.37.5
## Date:     2020-06-17
## Author:   Philip Leifeld (University of Essex)
## 
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
## 
## Attaching package: 'texreg'
## The following object is masked from 'package:tidyr':
## 
##     extract
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## Loading required package: betareg
Statistical models
  (1) (2)
offer4 -0.003*** -0.017*
  (0.001) (0.009)
low 0.112*** 0.017
  (0.006) (0.043)
med 0.119*** 0.024
  (0.006) (0.038)
waved2 -0.007  
  (0.004)  
waved3 -0.011**  
  (0.004)  
AIC 29464.760 299.792
BIC 29518.049 317.587
Log Likelihood -14726.380 -145.896
Deviance 29452.760 291.792
Num. obs. 53178 632
p < 0.001; p < 0.01; p < 0.05

Column (1) presents the probit marginal effect of clients who received offers at or below the standard rate of their risk category. A 100-basis-point increase in the monthly interest rate can be associated with reduced take-up by 0.3 percentage points. This seems to be a very small effect since we know from exercise 2 that the price ranges between 3.25% and 11.75%. This means that price decrease from the maximum to the minimum would increase the take-up by only 2.6 percentage points ((11.75%-3.25%)*-0.003 = 0.026). In column (2) we can see that a 100-basis-point increase in the monthly interest rate has a higher effect on clients who received offers higher the standard rate of their risk category. In this case the price increase results in a 1.7 percentage points lower take-up. This means that the effect or rather the price sensitivity is nearly six times higher for clients who received offers above their risk category.

To estimate if the price sensitivity changed when the lender offered higher than the clients standard rate, we have to regress applied on normrate_more and the experimental validation variables. The regression shows that higher interest rates reduced the level of take-up. Clients who received a higher than standard rate were 3 percent less likely to apply for a loan.

4.1.4) Press Check to perform the regression and present the probit marginal effects.

reg3_3 <- glm(applied ~ normrate_more + low + med + waved2 + waved3, family = binomial(link = "probit"), data=dat)
showreg(list("(3)"=reg3_3), coef.transform=c("mfx"), omit.coef = "(Intercept)", digits=3, type="html")
Statistical models
  (3)
normrate_more -0.030***
  (0.008)
low 0.124***
  (0.005)
med 0.124***
  (0.006)
waved2 -0.008
  (0.004)
waved3 -0.012**
  (0.004)
AIC 29791.929
BIC 29845.288
Log Likelihood -14889.965
Deviance 29779.929
Num. obs. 53810
p < 0.001; p < 0.01; p < 0.05

Price Elasticity

Another way to classify the estimated results is to calculate the take-up elasticity. In a economic sense, the elasticity can be quantified as the ratio of the percentage change in one variable to the percentage change in another variable if a non-coincidental connection between the two variables exists.

We can use our result to calculate the elasticity. To do so we need the average values of the response variable applied and the average value of the explanatory variable offer4. than we can use the formula for elasticity of demand:

elasticity = marginal effect x (Ø explanatory variable/ Ø response variable)

Quiz: Calculate the price elasticity of demand for our sub-sample in regression reg3_1. The mean offer rate is 8 and the mean of clients who applied is 0.085. Type in you answer in the box below rounded by two digits.

Answer: -0.28

Graphical Analysis

We can illustrate the relationship between the monthly offer rate and an application or rather demand price sensitivity graphically in a partial regression plot. In a partial regression plot we compute the residuals by regressing the response variable offer4 on the independent variables excluding applied. We than compute the residuals by regressing the independent variable appliedon th remaining independent variables. The partial regression plot is a plot that

We use another geom- function called geom_smooth(), to illustrate the marginal effects with a smoothing line. We also have to add se=FALSE and method="loess" into the geom_smooth() command. We set se=FALSE because we would get an overflow otherwise, knowing our data set is quiet large.

4.1.5) Replace the question marks with the geom- function. Afterwards press Check.

x_resid <- resid(lm(offer4 ~ low + med + waved2 + waved3, data=dat))
y_resid <- resid(lm(applied ~ low + med + waved2 + waved3, data=dat))
# ggplot(dat, aes(x=x_resid, y=y_resid)) +
#   geom_smooth(method="???", se=???, span=0.5)

ggplot(dat, aes(x=x_resid, y=y_resid)) +
  geom_smooth(method="loess", se=FALSE, span=0.5)
## `geom_smooth()` using formula 'y ~ x'

The demand curve confirms our thesis that an interest rate decrease is associated with a higher loan size and an increased offer rate with a lower loan size. However, a kink at approximately an interest rate increase of 150-basis-points is particular noticeable. It shows that if we increase the interest rate by more than 150-basis-points the demand curve or rather the loan size falls strongly.

Possible Explanations for the Kink in the Demand-Curve

One explanation for the kink is selection based on rates of return. Since our sample consists only of prior borrowers, it could be that everyone in the experiment has a discount or return rate approximately equal to the lender’s standard rates. Hence, prior borrowers were roughly indifferent about borrowing at their standard rate, and a rate increase leaves them strictly unwilling to borrow. There are two problems with this explanation. First, it delivers the counterfactual prediction that lowering the interest rate should affect only the intensive margin, since everyone in the sample had already demonstrated a willingness to borrow at standard rates. Second, it seems likely that rates of return for potential borrowers vary over time with the severity of liquidity constraints and opportunity sets. In this case, we would not necessarily expect to find an indifference point at standard rates, and it would be unlikely that selection on rates of return is a sufficient explanation for the kink.

Borrowing from an Outside-Lender

A second explanation for the kink in the demand curve could be that clients borrowed elsewhere if the offer rate was to high. To test this hypothesis we will perform the same regression as in the previous tasks but this time we will only focus on clients who ended up borrowing from another financial institution. For this purpose the lender obtained credit bureau data and it is described with the variable tookup_outside_only in our data set.

4.1.6) Press Check to perform the regression of clients who borrowed from other financial institutions on the offer rate. Afterwards use the regression results to answer the question below.

reg3_4 <- glm(tookup_outside_only ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data = filter(dat, normrate_less == 1))

reg3_5 <- glm(tookup_outside_only ~ offer4 + low + med, family = binomial(link = "probit"), data = filter(dat, normrate_more == 1))

reg3_6 <- glm(tookup_outside_only ~ normrate_more + low + med + waved2 + waved3, family = binomial(link = "probit"), data = dat)

showreg(list("(4)"=reg3_4,"(5)"=reg3_5, "(6)"=reg3_6), coef.transform=c("mfx", "mfx", "mfx"), omit.coef = "(Intercept)",  digits=3, type="html", omit.stat = c("low", "med", "waved2", "waved3"))
Statistical models
  (4) (5) (6)
offer4 0.001 -0.010  
  (0.001) (0.018)  
low 0.028*** 0.007 0.025***
  (0.006) (0.079) (0.006)
med -0.005 0.068 -0.005
  (0.006) (0.067) (0.006)
waved2 -0.054***   -0.053***
  (0.007)   (0.007)
waved3 -0.050***   -0.049***
  (0.007)   (0.007)
normrate_more     0.005
      (0.017)
AIC 56387.254 755.138 57138.454
BIC 56440.542 772.933 57191.813
Log Likelihood -28187.627 -373.569 -28563.227
Deviance 56375.254 747.138 57126.454
Num. obs. 53178 632 53810
p < 0.001; p < 0.01; p < 0.05

Quiz: Did higher offer rates induce more borrowing from other financial institutions?.

  • A 100-basis-point increase can be associated with a 0.1 higher take-up from an outside borrower for clients at or below the standard rate and a decrease in take-up by -0.1 for clients who received offers higher the standard rate of their risk category. [ ]

  • There is no significant relationship between higher offer rates and borrowing from an outside lender. [x]

  • A 100-basis-point increase can be associated with a -0.1 lower take-up from an outside borrower for clients at or below the standard rate and a increase in take-up by 0.1 for clients who received offers higher the standard rate of their risk category.. [ ]

The results in columns 4 and 5 suggest that there is a positive relationship between a higher offer rate and clients who received an offer rate at or below their standard ones and ended up borrowing from another financial institution. As well as a negative relationship for clients who received a higher offer rate than the standard for their risk category. In addition (6) indicates that a higher offer rate strengthened the take-up from other financial institutions. However the confidence intervals rule out economically large substitution. Therefore we can not say that higher interest rates influenced the borrowing behaviour of clients in respect to borrowing from other financial institutions. Hence, we cannot rule out other financing opportunities like i.e. family, friends or moneylenders.

Borrowing after the Deadline

There is another possible explanation for the kink in the demand curve. Clients could have borrowed after the deadline if their offer rate was higher than the standard rate and than borrow at the lower rate. This is testable by examining the post-deadline borrowing behavior of the clients. A logical outcome would be that clients with a higher offer rate than the standard rate will borrow after the deadline expired. In order to test this hypothesis we will perform the regression once again, but this time solely with clients who borrowed after the deadline. To find out which variable describes those clients think back to the first exercise.

4.1.7) This time you only have to replace the question marks with the variable which indicates if clients borrowed after the deadline. If you think your answer is correct press Check. If you can not find the variable in data just press hint.

# reg3_7 <- glm(??? ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=filter(dat, normrate_less == 1))
# 
# reg3_8 <- glm(??? ~ offer4 + low + med, family = binomial(link = "probit"), data = filter(dat, normrate_more == 1))
# 
# reg3_9 <- glm(??? ~ normrate_more + low + med + waved2 + waved3, family = binomial(link = "probit"), data = dat)
# 
# showreg(list("(7)"=reg3_7,"(8)"=reg3_8, "(9)"=reg3_9), coef.transform=c("mfx", "mfx", "mfx"), omit.coef = "(Intercept)", output = "html", digits=3)


reg3_7 <- glm(tookup_afterdead_enforced ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=filter(dat, normrate_less == 1))

reg3_8 <- glm(tookup_afterdead_enforced ~ offer4 + low + med, family = binomial(link = "probit"), data = filter(dat, normrate_more == 1))

reg3_9 <- glm(tookup_afterdead_enforced ~ normrate_more + low + med + waved2 + waved3, family = binomial(link = "probit"), data = dat)

showreg(list("(7)"=reg3_7,"(8)"=reg3_8, "(9)"=reg3_9), coef.transform=c("mfx", "mfx", "mfx"), omit.coef = "(Intercept)", digits=3, type="html")
Statistical models
  (7) (8) (9)
offer4 0.000 -0.012  
  (0.001) (0.015)  
low 0.201*** 0.177* 0.200***
  (0.007) (0.083) (0.006)
med 0.155*** 0.155* 0.154***
  (0.007) (0.066) (0.007)
waved2 -0.065***   -0.065***
  (0.005)   (0.005)
waved3 -0.065***   -0.065***
  (0.005)   (0.005)
normrate_more     -0.036**
      (0.011)
AIC 42215.085 571.133 42779.521
BIC 42268.373 588.928 42832.880
Log Likelihood -21101.543 -281.566 -21383.761
Deviance 42203.085 563.133 42767.521
Num. obs. 53178 632 53810
p < 0.001; p < 0.01; p < 0.05

Award: Daniel Kahneman

The psychologists Daniel Kahneman and Amos Tversky identified a specific type of consumer behavior of decision under risk. In their in 1979 published article “Prospect Theory: An Analysis of Decision under Risk”. They describe that experience of a consumer losing money weights greater than the possibility with gaining the same amount of money.


In fact, we get a unexpected result - higher offer rates are associated with less post-deadline borrowing. The result of the regression reg3_8 in column (9) indicate that higher offer rates lead to -0.036 percentage points less post-deadline borrowing. However, the results in column (7) and (8) do not suggest any economically large substitution.

Our pattern of results with respect to timing is consistent with switching costs. The regression results of this exercise show that pre- and post-deadline borrowing decreases in price. Therefore an explanation for our result could be that clients applied for a loan from other financial institutions pre-deadline and found it to cost-intensive to switch back to our lender after the deadline. Our estimated behavioral pattern also corresponds with the general assumption that consumers evaluate prices in comparison to prior experiences. In this context a price increase is often seen as more decision-relevant as a potential gain.

Our results on non-linearities in price sensitivity seem to be consistent with explanations of general consumer behavior of Daniel Kahnemann and Amos Tversky (1979) and the possibility other financing options in the informal markets with switching costs.

Exercise 4.2 – Price Elasticity of Loan Size

In this second section of exercise 4 we want to examine the price sensitivity of loan size. The loan size is expressed in South African Rand (R). For your information: R1.00 corresponds to approximately 0.067 US-Dollar (https://www.xe.com/currencyconverter/convert/?Amount=1&From=ZAR&To=USD, 03/30/2021).

Our analysis in this part is dependent on three conditional traits and controls: conditional on borrowing, branch fixed effects and additional controls for demos and credit risk.

4.2.1) Press Checkto load in the data once again.

dat <- read_dta("kz_demandelasts_aer08.dta")

Intensive Margin - Loan Size

Unconditional Loan Size

In the first part of this exercise we are focusing on clients who were randomly assigned with an equal offer and contract rate as well as below the lender’s standard rate for each individual risk category. Want to compare the price sensitivity of the amount borrowed, unconditional on borrowing, of all clients who were at our below the standard rate and non-borrowers.

In our first subsample of clients the dependent variable loansize only includes pre-deadline borrowing and we condition just on the risk category (med + low) and the mailer wave (wave2 + wave3). However, this time we will perform a felm() regression of the lfe package. It is similar to lm() which we used in exercise 2. Though, felm() used to fit linear models with group fixed effect. It uses the Method of Alternating projections to sweep out multiple group effects from the normal equations before estimating the remaining coefficients with OLS.


Info: Linear regression with multiple group fixed effects with felm()

The felm function is a part of the lfe package and is intended to be used with large data sets with multiple group fixed effects. The basic template consists of the dependent variable, a formula of four parts and the data. The fist part consists of the ordinary independent variables \(x_1 + x_2\) and the second part of displayed factors or rather our fixed effects \(f_1 + f_2\). The third part of the IV-specification \((Q|W ~ x_3 + x_4)\) and the fourth part of the cluster specification of the standard errors. Combined it i.e. looks like:

reg <- felm(y ~ x1 + x2 | f1 + f2 | (Q|W ~ x3 + x4) | clu1 + clu2, data=data)

In our analysis we will not use the IV-specification, therefore we can specify it our any other given part we are not using as 0.

For further information check out: https://rdrr.io/cran/lfe/man/felm.html


4.2.2) Perform a linear regression with fixed effects stored in reg4_1. Regress the dependent variable loansize on the ordinary independent variable offer4 and the fixed effects low + med + waved2 + waved3 and cluster it by branchuse. Replace the question marks with the missing variables. If you have no clue how to perform the regression, just press hint for help.

# reg4_1 <- felm(??? ~ ??? | ??? |0| ???,  data = filter(dat, offer4==final4, normrate_less==1))
# summary(reg4_1)

reg4_1 <- felm(loansize ~ offer4 |low + med + waved2 + waved3 |0| branchuse,  data = filter(dat, offer4==final4, normrate_less==1))
summary(reg4_1)
## 
## Call:
##    felm(formula = loansize ~ offer4 | low + med + waved2 + waved3 |      0 | branchuse, data = filter(dat, offer4 == final4, normrate_less ==      1)) 
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -334.2  -77.2  -59.7  -49.8 9771.0 
## 
## Coefficients:
##        Estimate Cluster s.e. t value  Pr(>|t|)    
## offer4   -4.368        1.093  -3.996 0.0000646 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 506.3 on 31225 degrees of freedom
## Multiple R-squared(full model): 0.03295   Adjusted R-squared: 0.03279 
## Multiple R-squared(proj model): 0.0004047   Adjusted R-squared: 0.0002446 
## F-statistic(full model, *iid*):212.8 on 5 and 31225 DF, p-value: < 0.00000000000000022 
## F-statistic(proj model): 15.97 on 1 and 107 DF, p-value: 0.0001187 
## *** Standard errors may be too high due to more than 2 groups and exactDOF=FALSE

The estimate coefficient shows that for each 100 basis-point increase in the interest rate the loan size can be associated with a decrease by approximately 4.4R. Given the unconditional loan size of 106 and average offered interest rate of 7.8 percent, the implied elasticity is -0.32. Which implies that a one percent higher interest rate is associated with a 0.32 percent lower loan size.

We also want to explore the loan price demand estimate for a sub-sample of non-borrowers. We will re-estimate the model from the previous regression, though we add additional control variables and branch fixed effects. The additional controls added to unconditional specifications include: quadratics in internal credit score, external credit score, and gross income at time of pre-approval, months since last loan with lender, number of prior loans with lender, gender, number of dependents, marital status, quadratic in age, rural residence, education, and province. Controls for conditional specifications include net income at the time of approval. The command omit() of the stargazer packages omits the control variables in our stargazer output below.

4.2.3) Press checkto re-estimate the regression in 4.2.2) with additional control variables for non-borrowers.

dat <- dat %>% mutate(grossincomesq = grossincome^2, agesq = age^2, appscoresq = appscore^2, itcscoresq = itcscore^2, sales_netincomesq = sales_netincome^2, sales_grossincomesq = sales_grossincome^2)

reg4_2 <- felm(loansize ~ offer4 + grossincome + grossincomesq + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq | low + med + waved2 + waved3 + female + married  + rural + edhi + appscore0 + itczero + branchuse + province |0| branchuse, data = filter(dat, offer4==final4, normrate_less==1))

stargazer(reg4_1, reg4_2, omit = c("grossincome", "grossincomesq", "dormancy", "trcount", "dependants","age", "agesq", "appscore", "appscoresq" ,"itcscore" , "itcscoresq", "trcount"), type="html", header = FALSE, se=list(coef(summary(reg4_1, reg4_2, cluster = c("html")))[, 2]))
Dependent variable:
loansize
(1) (2)
offer4 -4.368*** -4.394***
(1.093) (1.143)
Observations 31,231 28,197
R2 0.033 0.062
Adjusted R2 0.033 0.057
Residual Std. Error 506.350 (df = 31225) 499.250 (df = 28061)
Note: p<0.1; p<0.05; p<0.01

As we can see, the result does not change for non-borrowers. This seems to be consistent with non-borrowers having the same intensive margin price sensitivity as borrowers, which we saw in exercise 4.1. However, as also discussed in the previous exercises, the result is difficult to interpret, because of the fact that the price sensitivity is nonzero and inconsistent on the extensive margin. The loan size demanded may be affected by characteristics other than the risk category. Therefore, an interpretation of our loan size elasticity results is more useful for a subsample of clients who actually borrowed from our lender.

Conditional Loan Size

No we perform the regression conditional on borrowing or in other words on only borrowers. In column (1) we run the regression of the loan size on the standard conditions of the experiment for borrowers only. In column (2) we add the additional controls for selection and in column (3) we run a Tobit regression to find out if the loan size demand may be censored by supply constraints.


Info: Tobit Regression

The elasticity of demand can be calculated with the use of the midpoint formula:

elastcity = estimate coefficent x (Ø average Offer Rate/ Ø loan size)


4.2.4) Press Check to run the above mentioned regressions.

dat <- dat %>% mutate(grossincomesq = grossincome^2, agesq = age^2, appscoresq = appscore^2, itcscoresq = itcscore^2, sales_netincomesq = sales_netincome^2, sales_grossincomesq = sales_grossincome^2)

reg4_3 <- felm(loansize ~ offer4 | low + med + waved2 + waved3 |0| branchuse, data = filter(dat, offer4==final4, normrate_less==1, tookup==1))

reg4_4 <- felm(loansize ~ offer4 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq | low + med + waved2 + waved3 + female + married  + rural + edhi + appscore0 + itczero + branchuse + province |0| branchuse, data = filter(dat, offer4==final4, normrate_less==1, tookup==1))

reg4_5 <- censReg(loansize ~ offer4 + low + med + waved2 + waved3 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq + female + married  + rural + edhi + appscore0 + itczero + province, data = dat)

stargazer(reg4_3, reg4_4, reg4_5, omit = c("low", "med", "waved2" ,"waved3","grossincome" , "grossincomesq" , "dormancy" , "trcount" , "female" , "dependants" , "married" ,"age" , "agesq" , "rural" ,"edhi" , "appscore", "appscoresq" , "appscore0" , "itcscore" , "itcscoresq" , "itczero" , "branchuse" , "province", "sales_grossincome", "sales_grossincomesq"), type="html", header = FALSE, digits=5)
## Error in attr(ll, "df") <- sum(activePar(object)): Versuch ein Attribut von NULL zu setzen

As we can see in column (1) an increase of the offer rate by 100-basis-points for borrowers can be associated with a -25.876R lower loan size. We can also apply the formula for the loan demand elasticity on our result, than we get an implied elasticity of -0.13 which seems comparatively small. Column (2) presents our estimate of loan size price sensitivity conditional on borrowing and with the additional control variables. It estimates a slightly higher decrease of the loan size compared to column (1) of -33.715R and an increased elasticity of -0.17. The result of the Tobit regression in column (3), without branch fixed effects, does not change significantly in comparison to the regression column (2), which indicates that the loan size demand is not censored by supply constraints

We can portray our results of the conditional loan size in a demand curve graphically as we did in task 4.1.5).

4.2.5) Press Check to generate the demand curve for borrowers only.

data_plot2 <- dat %>% filter(tookup==1)

x_plot2 <- lm(offer4 ~ low + med + waved2 + waved3, data=data_plot2)
y_plot2 <- lm(loansize ~ low + med + waved2 + waved3, data=data_plot2) 
 
ggplot(data_plot2, aes(x=x_plot2$residuals, y=y_plot2$residuals)) +
  geom_smooth(method = "loess", span=0.5) +
  ylab("Loansize") +
  xlab("Offer Rate")
## `geom_smooth()` using formula 'y ~ x'

The graph shows that the effect that an increase of the interest rate has a negative effect on the loan size and a decreased offer rate an positive effect on the amount borrowed.

Log-Log-Specifications

Now, we want to take a look at the log-log-specifications of our previous regressions. We can use the log-log-regression estimates as an alternative to determine the elasticities of demand. The difference between a normal linear-linear-model and a log-log-model is that the we use the logarithmized values of the \(y\) and \(x\) variable. The estimates of a log-log-regression can be interpreted as if we increase the \(x\) variable by one percent the \(y\) changes in average by \(\beta_1\) percent.

4.2.6) Change the loansize and offer4 variables to their logarithmized (ln) values. Press Check afterwards.

# reg4_6 <- felm(loansize ~ offer4 | low + med + waved2 + waved3 |0| branchuse, data = filter(dat, offer4==final4, tookup==1, normrate_less==1))
# 
# reg4_7 <- felm(loansize ~ offer4 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq | low + med + waved2 + waved3 + female + married  + rural + edhi + appscore0 + itczero + branchuse + province |0| branchuse, data = filter(dat, offer4==final4, normrate_less==1, tookup==1))
# 
# reg4_8 <- felm(loansize ~ offer4 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq | low + med + waved2 + waved3 + female + married  + rural + edhi + appscore0 + itczero + province |0| branchuse, data = filter(dat, offer4==final4, normrate_less==1, tookup==1))
# 
# stargazer(reg4_6, reg4_7, reg4_8, omit = c("low", "med", "waved2" ,"waved3","grossincome" , "grossincomesq" , "dormancy" , "trcount" , "female" , "dependants" , "married" ,"age" , "agesq" , "rural" ,"edhi" , "appscore", "appscoresq" , "appscore0" , "itcscore" , "itcscoresq" , "itczero" , "branchuse" , "province"), type="html", header = FALSE)

reg4_6 <- felm(lnloansize ~ lnoffer4 | low + med + waved2 + waved3 |0| branchuse, data = filter(dat, offer4==final4, tookup==1, normrate_less==1))

reg4_7 <- felm(lnloansize ~ lnoffer4 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq | low + med + waved2 + waved3 + female + married  + rural + edhi + appscore0 + itczero + branchuse + province |0| branchuse, data = filter(dat, offer4==final4, normrate_less==1, tookup==1))

reg4_8 <- felm(lnloansize ~ lnoffer4 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq | low + med + waved2 + waved3 + female + married  + rural + edhi + appscore0 + itczero + province |0| branchuse, data = filter(dat, offer4==final4, normrate_less==1, tookup==1))

stargazer(reg4_6, reg4_7, reg4_8, omit = c("low", "med", "waved2" ,"waved3","grossincome" , "grossincomesq" , "dormancy" , "trcount" , "female" , "dependants" , "married" ,"age" , "agesq" , "rural" ,"edhi" , "appscore", "appscoresq" , "appscore0" , "itcscore" , "itcscoresq" , "itczero" , "branchuse" , "province"), type="html", header = FALSE)
Dependent variable:
lnloansize
(1) (2) (3)
lnoffer4 -0.113** -0.143*** -0.138***
(0.049) (0.040) (0.041)
sales_netincome 0.0001*** 0.0001***
(0.00001) (0.00001)
sales_netincomesq -0.000*** -0.000***
(0.000) (0.000)
Observations 2,325 2,304 2,304
R2 0.058 0.342 0.297
Adjusted R2 0.056 0.308 0.288
Residual Std. Error 0.714 (df = 2319) 0.610 (df = 2190) 0.619 (df = 2274)
Note: p<0.1; p<0.05; p<0.01

The the log-log-specifications estimate a loan demand elasticity of -0.11 for borrowers without additional control variables and branch fixed affects in column (1) and -0.13 with additional controls and branch fixed effects. For the Tobit specification in column (3) we observe an elasticity of -0.14.

After all, we still find elasticities of loan size demand that are quite low.


Award: Alfred Marshall

Alfred Marshall was the first to develop the standard supply and demand graph. In his most important book, Principles of Economics, Marshall emphasized that the price and output of a good are determined by both supply and demand: the two curves are like scissor blades that intersect at equilibrium. Did you know that the term “elasticity of demand is also credited to Alfred Marshall? He described price elasticity of demand as:”And we may say generally — the elasticity of demand in a market is great or small according as the amount demanded increases much or little for a given fall in price, and diminishes much or little for a given rise in price".

Fig. 7 | Alfred Marshall, retrieved from http://www.hagen-bobzin.de/mikro/Marshall.html))


Exercise 5 – Pricing Stratgey

In this section we want to determine the optimal pricing strategy for our lender. We combine the average price elasticities of demand results from our prior section with additional information on revenues and repayment.

We already know what influence an adjusted offer rate has on the loan size and take-up for clients. Now we want to define the consequences for the lender or rather if an increased offer rate would be more profitable for the lender. We are in particular interested in the effects of an amended offer rate on the gross revenue of the lender and the loan defaults of the clients. Here we encounter an alleged point of criticism of microcredits. It is often assumed that for lenders the best strategy for profit maximization is to charge horrendous interest rates. Which seems logical at first sight. We want to evaluate this assumption and answer the question wether the lender should raise or cut offer rates.

5.1.1) Press Checkto load in the data once again.

dat <- read_dta("kz_demandelasts_aer08.dta")

Short-Run Pricing Strategy

To examine this thesis, let us have a look at the optimal price in the short-run for the lender at first. For this we will aggregate the revenue and repayment results (grossinterest) over the entire sample frame for clients who borrowed at or below the lenders standard rate (normrate_less == 1). This will provide us with information on the price sensitivity of gross revenue obtained on initial pre-deadline borrowing. Or in other words it will show us how an adjusted interest rate affects the lender’s gross revenue. Hence, we want to perform a regression of lender’s gross revenue on the offer rate. Afterwards we can interpret the regression estimates by answering the question down below.

5.1.2) Just press Check to perform the regression of grossinterest on offer4.

reg5_1 <- felm(grossinterest ~ offer4 | low + med + waved2 + waved3 |0| branchuse, data = filter(dat, offer4==final4, normrate_less==1))

stargazer(reg5_1, type="html", header=FALSE)
Dependent variable:
grossinterest
offer4 2.553***
(0.438)
Observations 31,231
R2 0.020
Adjusted R2 0.020
Residual Std. Error 264.218 (df = 31225)
Note: p<0.1; p<0.05; p<0.01

Finish the following sentence (rounded by one digit) and write your answer in the answer-box below:

Quiz: The gross revenue result implies that a 100-basis-point increase of the offer rate increase the gross revenue by R???.

Answer: 2.6

Our result implies that an increased offer rate would generate a higher revenue for our lender. Though, this seems to contradiction with to our demand curve analysis in task 4.1.5) which suggested that a higher loan price would reduce the loan take-up at a certain point significantly. Thus, let us have a look at our demand curve for revenue in respect to price.

5.1.3) Just press Check and the code below will create the demand curve.

data_plot3 <- dat %>% filter(offer4==final4, normrate_less==1)
x_resid3 <- resid(felm(offer4 ~ low + med + waved2 + waved3, data=data_plot3))
y_resid3 <- resid(felm(grossinterest ~ low + med + waved2 + waved3, data=data_plot3))

ggplot(data_plot3, aes(x=x_resid3, y=y_resid3)) +
  geom_smooth(method = "loess", span=0.8, se=FALSE) +
  xlab("Offer Rate") +
  ylab("Gross Revenue per Client")
## `geom_smooth()` using formula 'y ~ x'

The lender’s gross revenue curve is slightly upwards sloping over the range of rates below the ones under standard operations. However, at an increased offer of more then approximately 300-basis-points, the additional revenue from the price increase, decreases clearly. An explanation for this kink could be our obtained findings in exercise 4.1 that a higher interest rate reduces loan take-up at a certain point and therefore also the total revenue of the lender.

Having said this, the lender could still raise the offer rate to a certain degree to increase his gross revenue. However, a higher loan price could also influence the clients solvency due to moral hazards, adverse selection or other shocks that the clients can not offset. Our regression in 5.1.4) suggests that the same loan price increase, increases loan default. In a narrow sense, the past-due amount increases by 12.2R per client for each 100-basis-point increase of the offer rate.

5.1.4) Press the button with the Check label to show the regression results.

reg5_2 <- felm(pstdue_average ~ offer4 | low + med + waved2 + waved3 |0| branchuse, data = filter(dat, offer4==final4, normrate_less==1))

stargazer(reg5_2, align=TRUE, covariate.labels=c("interest rate in pp terms (e.g., 8.2)"), no.space=TRUE, type="html")
Dependent variable:
pstdue_average
interest rate in pp terms (e.g., 8.2) 12.161***
(3.523)
Observations 2,325
R2 0.050
Adjusted R2 0.048
Residual Std. Error 365.427 (df = 2319)
Note: p<0.1; p<0.05; p<0.01

Our results suggest strongly that the lender should not raise the offer rates, it would lead to a lower gross revenue. But the question whether the lender should cut the offer rates to generate a higher revenue still remains.

Quiz: What do you think, would a lower offer rate increase the lenders revenue significantly?

  • Yes. [ ]

  • No. [x]

The results of regression the regression in 5.1.3) suggest that a 100-basis-point lower loan price would lead to 2.6R less revenue per client, but increase the repayment by 12.2R. Knowing that the average take-up rate is 7.4 percent, an offer rate cut produces 0.9R (12.2R * 0.074) more repayment or rather revenue per client offered the loan, but in total a 1.7R (-2.6R + 0.9R) lower net revenue per client.

Since our lender had no targeting objectives, our estimation suggests that nether, a loan price cut nor a loan price raise, offers an incentive for the lender. The cost of reducing the loan price or rather the interest rates slightly exceeded the benefits. Furthermore, it suggests that raising the rate would on the one hand decrease the revenue of the lender and on the other hand reduce repayment of the clients as well.


Award: Muhammad Yunus

Congratulations, you nearly finished the problem set. As earlier mentioned, Muhammad Yunus is one of the pioneers of microcredit and microfinance. But did you know that he won a novel price for his “Grameen Bank”?


Exercise Conclusion

The aim of this problem set was to test the hypotheses of price inelastic demand for consumer credit using randomized trials conducted by a high-risk consumer lender in South Africa. Furthermore, we wanted to determine the optimal short-run pricing strategy for a microlender. To do so we evaluated a randomized field experiment from a South African consumer microlender to the working poor.

We find downward-sloping but relatively flat demand with respect to price throughout a wide range of prices at and below the lender’s standard rates (which are the rates members of our sample received on their prior loans). In the lender’s case, the cost of reducing interest rates slightly exceeded the benefits (increased gross revenue from marginal borrowing, increased net revenue from higher repayment rates).

Since our lender had no targeting objectives, our estimation suggests that a loan price cut does not offer an incentive for the lender. The cost of reducing the loan price or rather the interest rates slightly exceeded the benefits. Futhermore, our evidence shows that this would have been counterproductive for our lender to raise the loan price. Our results also strongly suggest that raising rates would reduce repayments. Moreover, a small sample in our experiment shows that take-up elasticities of demand kinked sharply at the Lender’s standard rates, rising to well above unity. Raising rates would have decreased revenue and the Lender’s client base. Which seems contradictory, since Policymakers keen on avoiding subsidies often prescribe that MFIs should raise rates. In all, we find that the Lender could not have increased profits by changing rates.

However, the question if our results would apply to other settings still remains. We can not rule out personal preferences of clients, other financing options or othr experimental settings that may influence the outcome of our experiment.

Exercise Bibliography

Books & Papers

  • Dustan, A. (2010), “EEP/IAS 118 - Section Handout 13”, Retrieved from https://are.berkeley.edu/courses/EEP118/fall2010/section/13/Section%2013%20Handout%20Solved.pdf.
  • Earne, J., Jansson, T., Koning, A., & Flaming, M. (2014): “Greenfield MFIs in Sub-Saharan Africa A Business Model for Advancing Access to Finance. In Access to Finance Forum”, CGAP and Its Partners (No. 8).
  • Kahneman, D., & Tversky, A. (2013). Prospect theory: An analysis of decision under risk. In Handbook of the fundamentals of financial decision making: Part I, p.279.
  • Karlan, D. S., & Zinman, J. (2008). Credit elasticities in less-developed economies: Implications for microfinance. American Economic Review, 98(3), pp. 1040-68.
  • Wächter, L. (2017): “Marshall, Alfred. In Ökonomen auf einen Blick” (pp. 211-219), Springer Gabler, Wiesbaden.
  • Yunus, M. (2003): “Banker to the poor: Micro-lending and the battle against world poverty”, PublicAffairs.

Packages

Websites